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1  Introduction1 

The  design  and  prototyping  of  more  complex  and  powerful  computer  sys¬ 
tems  relies  heavily  on  testing  of  new  designs  by  simulation.  Furthermore, 
the  relatively  low-cost  availability  of  computing  equipment  has  introduced 
simulation  into  many  diverse  areas  such  as  agriculture,  biology,  and  econo¬ 
metrics,  in  addition  to  engineering  and  scientific  research.  The  advent  of 
operational  parallel  and  distributed  architectures  has  increased  the  interest 
in  distributing  the  execution  of  simulation  programs  over  several  processors 
in  order  to  reduce  their  execution  time. 

Traditionally,  simulation  algorithms  have  been  identified  as  either  time- 
driven  or  event-driven.  The  time-driven  model  reflects  all  the  variations 
in  the  system  being  modeled,  provided  a  sufficiently  low  time  granularity. 
It  is  very  efficient  in  modeling  continuous  change;  but  its  use  is  impractical 
however  in  modeling  systems  where  change  is  by  discrete  steps.  Event-driven 
simulation,  on  the  other  hand,  is  efficient  in  modeling  the  asynchronous 
occurrence  of  discrete  events  in  time.  For  such  systems,  it  provides  a  good 
modeling  accuracy  and  a  higher  execution  efficiency. 

However,  the  discrete-event  simulation  algorithm  is  essentially  a  highly 
sequential  algorithm  that  relies  on  the  centralized  notion  of  an  event  queue 
and  a  simulation  time.  Distributing  these  over  several  processors  can  imply 
a  large  overhead  to  ensure  the  sequential  consistency  of  the  simulation.  A 
distributed  simulation  is  sequentially  consistent  when  events  occur  in  the 
same  order  as  in  a  sequential  version. 

The  scheme  proposed  in  this  report,  while  maintaining  a  central  event 
queue  and  a  common  simulation  time,  allows  the  concurrent  execution  of 
events  over  several  processors  and  guarantees  a  sequentially  consistent  simu¬ 
lation.  The  possible  concurrency  among  events  is  detected  at  compile  time, 
and  the  run-time  overhead  is  kept  to  a  minimum. 

2  Simulation  Models 

2.1  Event-  and  Time-Driven  Simulation 

The  time-  and  event-driven  simulations  are  essentially  equivalent  algorithms. 
Both  have  the  same  modeling  power.  The  main  difference  between  them  is 
in  their  respective  expected  performance  for  a  given  problem. 

*Part  of  the  work  described  in  this  report  has  appeared  in  [l]  and  [2). 


In  the  time-driven  model,  the  simulation  time  is  moved  up  by  a  constant 
amount  at  each  iteration  of  the  simulation  algorithm.  At  each  step,  the 
simulator  proceeds  to  the  “next  instant.”  A  generic  algorithm  for  time- 
driven  simulation  is  as  follows: 

repeat 
t  =  t  +  At 

foreach  element  in  the  system  do 
evaluate  new  state 
post  global  state  changes 
until  (End-of-Simulation) 

All  elements  of  the  system  are  evaluated  at  each  iteration,  regardless  of 
their  activity  status.  This  approach  is  highly  efficient  if,  on  the  average, 
a  large  fraction  of  the  system  elements  are  active  during  any  simulation 
time  interval,  and  every  time  interval  witnesses  changes  in  the  state  of  the 
system.  Provided  a  sufficiently  small  time  interval,  this  approach  can  model, 
with  high  fidelity,  systems  with  time-continuous  variables  (such  as  electric 
voltages  in  circuit  simulation). 

In  the  event-driver,  simulation  model,  only  the  values  of  elements  that 
have  actually  changed  are  updated.  The  simulation  time  is  moved  up  at 
each  step  to  the  “next  event”  time.  All  “future”  events  are  maintained  in 
a  simulation  time  ordered  list.  A  generic  event-driven  algorithm  can  be 
described  as  follows: 

repeat 

tnczt  --  time(next-event) 
foreach  event  posted  at  tnezt 
evaluate 

schedule  any  new  events  generated 
until  (End-of-Simulation) 

All  events  scheduled  at  the  current  simulation  time  are  retrieved  and  eval¬ 
uated;  newly  generated  events  are  scheduled  on  the  event  list,  and  the  sim¬ 
ulation  time  is  advanced  to  the  time  of  the  next  scheduled  event.  This 
approach  is  particularly  suited  for  modeling  discrete  systems  where  state 
changes  occur  in  discre  te  increments. 

2.2  Concurrency  in  Simulation  Models 

In  essence,  a  simulator  runs  an  algorithmic  description  of  a  system.  The 
concurrency  delivered  by  the  execution  of  this  algorithm  is  several  fold: 


1.  Element  concurrency  exists  when  several  elements  of  the  system  can 
be  evaluated  at  the  same  time.  It  is  specific  to  the  time-driven  model, 
where  all  elements  are  evaluated  at  every  time  step.  For  instance, 
when  several  gates  of  a  complex  logic  circuit  receive  all  their  inputs 
from  the  same  outside  elements,  the  state  of  these  gates  can  always 
be  evaluated  concurrently  without  a  risk  of  dependency.  An  example 
of  such  an  implementation  is  found  in  the  IBM  Yorktown  Simulation 
Engine  [3,4,5], 

2.  Time  concurrency  results  from  the  simultaneous  occurrence  of  changes 
within  the  system.  In  other  words,  it  exists  when  several  unrelated 
events  are  scheduled  to  happen  at  the  same  simulation  time,  in  an 
event-driven  environment.  Such  a  feature  can  be  found  in  both  the 
Daisy  [6]  and  7.YCAD  [7]  machines. 

3.  Control  concurrency  consists  of  executing  (in  a  pipelined  fashion)  the 
tasks  that  are  at  the  core  of  the  event-driven  model:  Retrieve,  Evalu¬ 
ate,  and  Schedule  events.  It  has  been  described  in  [8]  and  implemented 
in  [6]. 

4.  Object  concurrency  means  that  a  set  of  logically  related  activities  can 
be  grouped  in  an  object.  Often  these  objects  exhibit  a  low  degree  of 
interaction  and  thus  can  be  evaluated  in  parallel  with  little  synchro¬ 
nization  overhead. 

2.3  Distributed  Simulation 

The  aim  of  distributed  simulation  is  to  map  the  simulation  model  over  sev¬ 
eral  loosely  coupled  processors,  where  objects  execute  locally  and  exchange 
information  in  the  form  of  time-stamped  messages.  Two  main  paradigms 
have  been  proposed  that  implement  distributed  discrete-event  simulation: 
the  Network  Paradigm  [9,10]  and  the  Time-Warp  Mechanism  [11,12];  both 
are  asynchronous  algorithms. 

The  Network  Paradigm.  This  model  was  proposed  independently  by 
Peacock  et  al.  [9,13]  and  Chandy  et  al.  [14,10].  It  can  be  described  as  a 
conservative  asynchronous  mechanism.  It  is  conservative  because  it  assumes 
that  synchronization  between  any  two  events  may  be  needed  until  proof  that 
it  is  not  required. 

Simulation  is  modeled  as  a  directed  graph,  where  arcs  represent  messages 
passing  among  objects  and  carry  a  monotonic,  non-decreasing,  simulation- 
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time-  ordered  sequence  of  events.  Every  node  has  a  local  simulation  time, 
otherwise  called  next  event  time.  Each  input  link  in  a  node  corresponds  a 
link  time,  which  is  the  value  of  the  time  stamp  of  the  last  message  received 
on  that  link.  The  next  event  in  a  node  is  chosen  as  the  minimum  link  time 
event. 

This  method  suffers  from  the  possible  introduction  of  deadlock  situa¬ 
tions.  Several  schemes  have  been  proposed  as  a  remedy  to  this  problem: 

•  The  Link  Time  Algorithm  [9]  reduces  the  probability  of  a  deadlock. 

•  The  Blocking  Table  Algorithm  [13]  allows  the  distributed  detection  of 
deadlocks,  however,  with  a  time  complexity  0(n3). 

•  The  Controller  Method  [14J  relies  on  a  central  controller  that  runs  a 
deadlock  detection  algorithm  and  initiates  recovery,  at  the  expense  of 
a  potential  bottleneck. 

The  Time-Warp  Mechanism.  This  mechanism,  proposed  by  Jeffer¬ 
son  and  Sowizral  [11],  is  an  asynchronous  optimistic  mechanism.  It  is  opti¬ 
mistic  because  it  assumes  (hopes  for)  independency  among  events,  and  im¬ 
plements  a  rollback  and  undo  when  these  conditions  are  not  verified.  Essen¬ 
tially,  it  relaxes  the  condition  of  monotonic,  non-decreasing,  time-stamped 
messages  along  arcs,  allowing  out-of-order  arrival  of  events  between  two  logic 
processes.  This  method  suffers  from  two  main  drawbacks:  a  potential  for 
domino  effect  in  rollbacks,  and  a  large  space  overhead  necessary  to  maintain 
the  past  history  of  each  process. 

Gafni  [12]  proposed  a  scheme  that  reduces  the  amount  of  space  over¬ 
head  by  implementing  a  garbage  collection  mechanism.  Lavenberg  et  al. 
[15]  present  an  analytical  evaluation  of  such  a  mechanism  for  two  processes 
mapped  onto  two  processors.  Their  results  show  that  a  good  speed-up  can 
be  obtained  when  the  probability  of  interaction  among  processes  is  low  (0.05 
or  less).  The  performance  degrades  for  larger  probabilities. 

3  Detection  of  Parallelism 

The  discrete-event  simulation  algorithm  is  essentially  sequential.  The  Net¬ 
work  Paradigm  and  the  Time- Warp  Mechanism  aim  at  speeding  up  the 
simulation  by  distributing  the  central  event  queue  over  a  network  of  pro¬ 
cessors.  The  methodology  described  in  this  report  aims  at  detecting  ohe 
possible  concurrency  among  events  in  a  central  event  queue  and  thereby 
exploiting  any  potential  parallelism. 
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3.1  System  Modeling 

Let  E  be  the  model  of  a  system  under  simulation.  It  can  be  decomposed 
into  a  set  of  independent  subsystems,  or  disjoint  objects.1  E  can  be  seen  as 
the  set  of  state  variables  describing  a  system  and  each  <t,-  as  a  subset  of  E. 

f  Vi  ±  j>,  n  a j  =  0 
\  U?=i  =  E  (n>l) 

Equation  1  states  that  the  partitions  of  E  are  non-overlapping.  Let  5(E) 
be  the  set  of  all  possible  states  of  E,  and,  for  each  subsystem  (object)  a,,  let 
S(ffi)  be  the  set  of  all  possible  states  of  <Xj.  Then  we  can  express  the  above 
partitioning  as  a  concatenation  of  states 

5(E)  =  (SM.Sfo),...,  $(*„)) 

An  event  in  the  system  can  therefore  be  defined  as  a  state  transition  over 
some  subset  a,  of  E  occurring  at  time  t.  An  event  in  cr,  at  time  tj  can  be 
described  as 

e(<Tj,ty)  =  {S(<r.)  =  si  t  <  tj}  A  (S(cr,)  =  s2  t  >  tj}  sx  ^  s2  (2) 

The  evaluation  of  an  event  can  result  in  the  creation  of  other  events 
within  the  same  subsystem  or  in  other  subsystems.  These  events  are  said  to 
be  induced  by  the  event  that  was  evaluated.  The  state  of  a  subsystem  can 
therefore  be  affected  by  events  in  one  or  more  other  subsystems.  A  relation 
of  causality  or  functional  dependei.ee,  between  any  two  subsystems  can  be 
defined  by 

Definition  1  A  subsystem  a ,  »*  functionally  dependent  upon  a  subsys¬ 
tem  <?j  iff  some  event  e[a ,,t)  can  induce  an  event  e'(crj,t')  in  finite  time. 

This  relation  is  denoted  by  a,  =>  o j.  From  this  definition  we  can  model  a 
system  with  an  Object-Dependency  Graph,  which  is  a  directed  graph  G  — 
(V,  E),  where 

V  =  {a  C  E} 

is  the  set  of  nodes,  and 


E  =  {(*.-,*/)  I  =>  c  E,  °j  C  £} 

1  In  this  report  the  terms  tubsystem  and  object  will  be  used  interchangeably. 
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Figure  2:  Object-dependency  graph. 


is  the  set  of  edges. 

Figures  1  and  2  show  the  example  of  a  simple  digital  circuit,  its  parti¬ 
tioning  into  a  set  of  objects  A ,  B,  C,  and  D,  and  the  corresponding  object 
dependency  graph. 

Based  on  the  relation  of  functional  dependence,  we  can  define  a  direc¬ 
tional  distance  between  two  objects  as 

Definition  2  6(<7,-,<7y)  =  minimum  possible  delay  between  any  event  e(ai,t) 
and  an  induced  event  e'(<Tj,t'). 

In  other  words,  £(<r,,  try)  is  the  lower  bound  on  the  possible  delay  between 
any  event  in  oy  and  its  possible  effect  in  a y.  If  6(<Tj,<7y)  =  +oo  then  no  event 
in  <7  can  induce  an  event  in  <7y. 

In  Figure  1,  the  directional  distances  dl,  d2,  and  d3  correspond  to  the 
respective  delays  in  gates  1,  2,  and  3. 
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Figure  3:  Possible  event  concurrency. 

3.2  Scheduling  Relations 

Two  subsystems  are  totally  independent  when 

6(0,0')  =  6(o',o )  =  +oo 

In  other  words,  neither  one  can  influence  the  other.  Therefore,  any  event 
occurring  in  one  of  these  objects  cannot  influence  the  evaluation  of  any  event 
in  the  other.  Such  two  events  can  obviously  be  evaluated  concurrently.  This 
remark  enables  us  to  define  a  relation  of  strict  independence  (S)  between 
two  events: 


e{a,t)  S  e(o',t')  =  ^(o  =>  o')  A  <0 

This  means  that  two  events  are  strictly  independent  if  and  only  if  the 
respective  subsystems  to  which  they  belong  are  functionally  independent. 
Verifying  this  relation  reduces  to  finding  the  non-connected  components  of 
the  model  graph  G.  Even  though  all  the  concurrency  among  events  will 
be  detected  across  independent  subsystems,  this  relation  will  miss  most  of 
the  parallelism  resulting  from  pipelining  of  events  along  the  same  (possibly 
cyclical)  path  of  the  graph. 

Indeed,  when  two  events  e  and  e '  are  generated  inside  two  subsystems  a 
and  a '  such  that  a  =>  a',  their  times  of  occurrence  can  be  compatible  with 
a  concurrent  evaluation. 

Figure  3  shows  an  event  history  with  e(o,  t)  and  e'(o',  t')  where  ( t '  -  t)  < 
6(o,o').  In  this  case,  all  events  (e.g.,  a  and  b)  that  can  possibly  be  induced  by 
the  evaluation  of  e  will  fall  in  the  future  of  e'  and  cannot  affect  its  evaluation. 
Therefore,  events  e  and  e'  could  be  evaluated  concurrently  without  affecting 
the  correctness  of  the  simulation. 

In  general,  e(o,t)  and  e'(o',t')  can  be  evaluated  concurrently  if  and  only 
if  the  result  of  evaluating  e  have  consequences,  i.e.,  induces  events,  in  o' 
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later  than  t' .  Since  8(0,  a1)  is  defined  as  the  minimum  delay  between  the  two 
subsystems,  the  following  inequality  must  hold  for  the  concurrent  evaluation 
to  be  correct: 

( t '  —  i)  <  8(0,  a') 

Because  of  a  possible  cycle  in  the  dependency  graph,  the  reciprocal  must 
also  hold.  We  define,  therefore,  a  relation  of  generalized  independence  (SR) 
between  two  events  (object  concurrency): 

e(cr,t)  9i  e'(tr',  t')  =  [f  —  t'  <  8(0' ,  a)]  A  [tf  —  t  <  8(0,  a')} 

Therefore,  determining  the  possible  concurrency  of  two  events  amounts 
to  comparing  the  interval  of  time  between  their  respective  occurrences  to 
the  time  distance  8  between  the  two  subsystems  to  which  they  belong.  5R 
is  not  an  equivalence  relation  since  it  is  not  transitive.  This  implies  that 
deciding  whether  several  events  can  be  evaluated  concurrently  at  a  given 
time  requires  the  examination  all  pairs  of  candidates. 

The  concurrency  among  events  is  not  limited  to  events  that  are  time 
consecutive.  At  any  simulation  time  tttm,  the  content  of  the  event  queue 
can  be  described  by  a  time-ordered  sequence  of  events,  where  the  ith  event 
at  simulation  time  t,  is  denoted  by 

Q  (Oim)  ^ aim  £  t*  ^  tj  I  <  J 

In  the  sequence  the  set  of  independent  (and  therefore  concurrent) 

events  is  defined  by 

C[tsim)  =  {e, •(*,•)  j  Vj  <  i  e,  5?  e y)  (3) 

The  definition  of  the  set  C(t3im)  states  that  any  event  with  an  associated 
simulation  time  t(  >  t3tm  can  be  evaluated  at  simulation  time  <atm  iff  this 
event  is  independent  of  all  the  events  preceding  it  in  the  sequence  Q(tsim). 
Obviously,  the  first  element  in  the  Q{t,im)  sequence  is  also  the  first  element 
in  C(t„m).  Note  that,  once  an  event  is  in  C(tsim),  it  stays  a  member  of  the 
set  until  it  is  evaluated.  In  other  words, 

e(f)  €  C(<o),  =>■  e(t)  £  C(ti)  Vto  5~  fi  5;  t 

Therefore,  C(Iq)  constitutes  the  set  of  independent  events  that  can  be 
evaluated  concurrently  at  any  simulation  time  t  >  to- 


3.3  Implementation  Considerations 

In  this  section  we  describe  a  scheme  for  the  run-time  detection  of  concur¬ 
rency  among  events  and  the  parallel  execution  of  these  events.  The  imple¬ 
mentation  of  such  a  scheme  relies  on  building  a  Delay  Table  D,  which  is  the 
description  of  the  object-dependency  graph  for  the  system  under  simulation. 
D  is  an  n  x  n  array,  where  n  is  the  number  of  objects.  The  delay  values  are 
defined  as  follows: 


The  diagonal  of  this  matrix  is  null,  since  there  is  no  delay  between  a  state 
transition  in  a  subsystem  and  its  effects  on  that  subsystem.  This  prevents 
two  events  pertaining  to  the  same  subsystem  from  being  evaluated  concur¬ 
rently.  If  the  object-dependency  graph  is  acyclic,  D  is  an  upper  triangular 
matrix,  the  values  in  the  lower  half  being  equal  to  infinity. 

The  Delay  Table  is  created  at  compile  time  by  an  analysis  of  the  object- 
dependency  graph,  which  yields  the  lower  bounds  on  thei  delays  between 
dependent  objects  in  the  system.  The  set  of  concurrent  events  at  simulation 
time  t  can  be  built  at  run-time  from  the  Delay  Table  and  the  event  queue. 

On  a  multiprocessor  with  m  processors,  one  is  dedicated  to  running  the 
event  queue  management  tasks,  allowing  up  to  m  -  1  events  to  be  evaluated 
concurrently.  Hence  the  event-driven  simulation  algorithm  becomes 


for  every  simulation  time  t; 
repeat 

determine  the  set  of  independent  events  C(t); 
schedule  the  execution  of  up  to  (m  —  1)  events  from  C(f); 
update  the  simulation  time  to  that  of  the  new  head  of  the  queue; 
insert  any  new  events  in  the  queue; 
until  (End-of-Simulation) 

Note  that  this  algorithm  guarantees  a  progress  rate  of  the  simulation 
at  -  equai  to  that  of  a  sequential  simulation.  The  head  of  the  queue  is 
j.  a  member  of  C7(t)  and  is  evaluated  at  every  iteration.  Therefore,  the 
:.  w.ation  time  is  always  updated  to  at  least  that  of  the  next  event,  as  in 
the  ntial  algorithm. 
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4  Conclusions 


We  have  examined  in  this  report  the  issue  of  simulation  in  a  parallel  environ¬ 
ment.  Although  the  problem  is  apparently  centered  around  the  centralized 
notion  of  time,  several  types  of  concurrency  can  be  found  in  simulation 
models.  Event-driven  simulation  has  been  shown  to  provide  time,  object,  or 
control  concurrency. 

Using  a  formal  description  of  a  system  under  simulation,  we  were  able 
to  define  relations  of  functional  dependence  among  objects  in  the  system,  as 
well  as  two  relations  of  strict  and  generalized  independence  between  events. 
A  compilation  strategy  has  been  described,  based  on  these  relations,  that 
allows  the  run-time  detection  of  parallelism  in  the  event-driven  simulation 
model.  This  strategy  has  been  shown  to  preserve  sequential  consistency 
among  events. 

Directions  for  future  research  include  evaluation  of  the  distribution  of  the 
degree  of  parallelism  that  can  be  obtained  using  this  method  in  applications 
such  as  switch-level  simulation  of  logic  circuits  and  the  stochastic  simulation 
of  network  of  queues.  Another  direction  is  the  implementation  of  a  parallel 
simulation  environment  on  a  shared  memory  multiprocessor. 
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